AITopics | data artifact

Collaborating Authors

data artifact

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

RADAR: Benchmarking Language Models on Imperfect Tabular Data

Neural Information Processing SystemsJun-20-2026, 21:21:29 GMT

Language models (LMs) are increasingly being deployed to perform autonomous data analyses. However, their data awareness--the ability to recognize, reason over, and appropriately handle data artifacts such as missing values, outliers, and logical inconsistencies--remains underexplored. These artifacts are especially common in real-world tabular data and, if mishandled, can significantly compromise the validity of analytical conclusions. To address this gap, we present RADAR, a benchmark for systematically evaluating data-aware reasoning on tabular data. We develop a framework to simulate data artifacts via programmatic perturbations to enable targeted evaluation of model behavior. RADAR comprises 2980 table query pairs, grounded in real-world data spanning 9 domains and 5 data artifact types. In addition to evaluating artifact handling, RADAR systematically varies table size to study how reasoning performance holds when increasing table size. Our evaluation reveals that, despite decent performance on tables without data artifacts, frontier models degrade significantly when data artifacts are introduced, exposing critical gaps in their capacity for robust, data-aware analysis. Designed to be flexible and extensible, RADAR supports diverse perturbation types and controllable table sizes, offering a valuable resource for advancing tabular reasoning.1

large language model, machine learning, natural language, (23 more...)

Neural Information Processing Systems

Country:

North America > United States (1.00)
Asia (1.00)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Overview (0.67)

Industry:

Information Technology (1.00)
Health & Medicine > Therapeutic Area (0.93)
Government > Regional Government > North America Government > United States Government (0.92)
(4 more...)

Technology:

Information Technology > Data Science > Data Quality (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

RADAR: Benchmarking Language Models on Imperfect Tabular Data

Neural Information Processing SystemsJun-13-2026, 14:09:30 GMT

Language models (LMs) are increasingly being deployed to perform autonomous data analyses. However, their data awareness--the ability to recognize, reason over, and appropriately handle data artifacts such as missing values, outliers, and logical inconsistencies--remains underexplored. These artifacts are especially common in real-world tabular data and, if mishandled, can significantly compromise the validity of analytical conclusions. To address this gap, we present RADAR, a benchmark for systematically evaluating data-aware reasoning on tabular data. We develop a framework to simulate data artifacts via programmatic perturbations to enable targeted evaluation of model behavior. RADAR comprises 2,980 table-query pairs, grounded in real-world data spanning 9 domains and 5 data artifact types. In addition to evaluating artifact handling, RADAR systematically varies table size to study how reasoning performance holds when increasing table size. Our evaluation reveals that, despite decent performance on tables without data artifacts, frontier models degrade significantly when data artifacts are introduced, exposing critical gaps in their capacity for robust, data-aware analysis. Designed to be flexible and extensible, RADAR supports diverse perturbation types and controllable table sizes, offering a valuable resource for advancing tabular reasoning.

artificial intelligence, data artifact, proceedings, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.42)

Add feedback

RADAR: Benchmarking Language Models on Imperfect Tabular Data

Gu, Ken, Zhang, Zhihan, Lin, Kate, Zhang, Yuwei, Paruchuri, Akshay, Yu, Hong, Kazemi, Mehran, Ayush, Kumar, Heydari, A. Ali, Xu, Maxwell A., Narayanswamy, Girish, Liu, Yun, Poh, Ming-Zher, Yang, Yuzhe, Malhotra, Mark, Patel, Shwetak, Palangi, Hamid, Xu, Xuhai, McDuff, Daniel, Althoff, Tim, Liu, Xin

arXiv.org Artificial IntelligenceNov-3-2025

Language models (LMs) are increasingly being deployed to perform autonomous data analyses. However, their data awareness -- the ability to recognize, reason over, and appropriately handle data artifacts such as missing values, outliers, and logical inconsistencies -- remains underexplored. These artifacts are especially common in real-world tabular data and, if mishandled, can significantly compromise the validity of analytical conclusions. To address this gap, we present RADAR, a benchmark for systematically evaluating data-aware reasoning on tabular data. We develop a framework to simulate data artifacts via programmatic perturbations to enable targeted evaluation of model behavior. RADAR comprises 2980 table query pairs, grounded in real-world data spanning 9 domains and 5 data artifact types. In addition to evaluating artifact handling, RADAR systematically varies table size to study how reasoning performance holds when increasing table size. Our evaluation reveals that, despite decent performance on tables without data artifacts, frontier models degrade significantly when data artifacts are introduced, exposing critical gaps in their capacity for robust, data-aware analysis. Designed to be flexible and extensible, RADAR supports diverse perturbation types and controllable table sizes, offering a valuable resource for advancing tabular reasoning.

large language model, machine learning, natural language, (23 more...)

arXiv.org Artificial Intelligence

2506.08249

Country:

North America > United States (1.00)
Asia (1.00)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Information Technology (1.00)
Health & Medicine > Therapeutic Area (0.93)
Government > Regional Government > North America Government > United States Government (0.92)
(4 more...)

Technology:

Information Technology > Data Science > Data Quality (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

No More Distractions: an Adaptive Up-Sampling Algorithm to Reduce Data Artifacts

Chen, Han

arXiv.org Artificial IntelligenceJan-24-2024

Researchers recently found out that sometimes language models achieve high accuracy on benchmark data set, but they can not generalize very well with even little changes to the original data set. This is sometimes due to data artifacts, model is learning the spurious correlation between tokens and labels, instead of the semantics and logic. In this work, we analyzed SNLI data and visualized such spurious correlations. We proposed an adaptive up-sampling algorithm to correct the data artifacts, which is simple and effective, and does not need human edits or annotation. We did an experiment applying the algorithm to fix the data artifacts in SNLI data and the model trained with corrected data performed significantly better than the model trained with raw SNLI data, overall, as well as on the subset we corrected.

accuracy, correlation, training data, (12 more...)

arXiv.org Artificial Intelligence

2401.13907

Country:

Asia > China > Hong Kong (0.05)
North America > United States > Oregon > Multnomah County > Portland (0.04)

Genre: Research Report > New Finding (0.35)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.94)

Add feedback

Probabilistic detection of short events, with application to critical care monitoring

Neural Information Processing SystemsApr-6-2023, 14:13:07 GMT

We describe an application of probabilistic modeling and inference technology to the problem of analyzing sensor data in the setting of an intensive care unit (ICU). In particular, we consider the arterial-line blood pressure sensor, which is subject to frequent data artifacts that cause false alarms in the ICU and make the raw data almost useless for automated decision making. The problem is complicated by the fact that the sensor data are acquired at fixed intervals whereas the events causing data artifacts may occur at any time and have durations that may be significantly shorter than the data collection interval. We show that careful modeling of the sensor, combined with a general technique for detecting sub-interval events and estimating their duration, enables effective detection of artifacts and accurate estimation of the underlying blood pressure values.

application, critical care, probabilistic detection, (3 more...)

Neural Information Processing Systems

Industry: Health & Medicine (1.00)

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback

PADL: portable PyTorch pipelines facilitating deep-learning model use

#artificialintelligenceApr-8-2022, 00:15:27 GMT

Programs are read more often than they are written. Models are used more often than they are trained. The PyTorch, and the deep-learning ecosystem in general, abounds with tools for training models, and squeezing the best performance out of computational resources in doing this. In the life cycle of a model this is only the beginning of the journey. Once a model has been trained, it will be shared, and used in a multitude of contexts, often on a daily basis, in operations, evaluation, comparision and experimentation by data scientists.

padl, pipeline, pytorch layer, (14 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Launch of the SandLabs Project

#artificialintelligenceAug-3-2021, 22:20:07 GMT

The SandLabs Team currently consists of Wyatt Walsh and Ryan Epprecht. Having met in high school, this dynamic duo has a rich history together and each member brings a rich set of experiences and skills to the team. Navigate to their various profiles if you are interested in learning more about Wyatt or Ryan. SandLabs aims to explore the blockchain domain via a data scientific lens to generate new insights and make helpful contributions to the BlockchainxData communities and beyond. The initial focus of our work will be data collection, extraction, and processing high-quality data for future use.

architecture, sandlab, sandlab project, (6 more...)

#artificialintelligence

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence (0.76)

Add feedback

Starting your data science project with Metaflow? The MNIST use-case.

#artificialintelligenceMay-3-2021, 19:02:46 GMT

The priority of data scientists simply lies in picking out the right features, building and deploying their models, they do not like to be particularly bothered about other aspects like model versioning, job scheduling, flow architecture, compute resources management, which is needed to make operationalizing data science successful. Metaflow is an open-source tool by Netflix for managing data science workflows. It aims to boost the productivity of data scientists by allowing them to focus on actual data science work and by facilitating faster productionization of their deliverables. If you are familiar with Airflow or Luigi then you would understand the function of Metaflow. It allows you to run your data science process in steps, so each step is a node in the process and the nodes are connected like a graph as seen below.

data science project, metaflow, mnist use-case, (10 more...)

#artificialintelligence

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.77)

Add feedback

Probabilistic detection of short events, with application to critical care monitoring

Aleks, Norm, Russell, Stuart J., Madden, Michael G., Morabito, Diane, Staudenmayer, Kristan, Cohen, Mitchell, Manley, Geoffrey T.

Neural Information Processing SystemsFeb-15-2020, 00:58:46 GMT

application, critical care, probabilistic detection, (3 more...)

Neural Information Processing Systems

Industry: Health & Medicine (1.00)

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback